This notebook, data from the Gapminder project will be analyzed and visualized.
Most prominently, the relationship between GDP and CO2 emissions will be investigated. Additionally, the relationship between continent and energy use is going to be explored and questions regarding the imports of goods and services, the population density and life expectancy are going to be answered with focus on specific geographic locations and time.
Feel free to explore the data in the interactive dataframe: (unfortunately, there is no way to keep ipython widget call back functions working (outside of the saved state) -- but I am leaving it in, to see if you know more than I do here and can tell me a trick how to keep functionality here)
Can a relationship between the CO2 emission and GDP per capita in 1962 be identified, when comparing data from different countries?
To answer that question, all data from 1962 is extracted from the dataset.
In 1962 there was a clear positive relationship between Carbon Dioxide emissions and GDP per capita. As many carbon dioxide emitting technologies, such as cars, were a lot more costly during that time it makes sense that citizens of higher income countries would be more likely to be able to afford those technologies. The plot clearly shows the majority of the carbon producers being high income European, North American and Oceanic and some Gulf countries.
Today the situation is different, with carbon dioxide emitting technologies being widely available and more affordable. The situation also changes in terms of environmetally friendly technologies whcih have emerged in recent years and with the consciousness about the dangers of carbon dioxide emission. Today, many high income countries have eavily invested in carbon reducing technologies and are more likely to use environmentally friendy alternatives, so that it is likely to see a tredn reversal for the same plot with 21st century data.
In fact, upon closer inspection it becomes clear, that these two parameters are strongly correlated in 1962.
The data shows a correlation of R=0.93, with a p-value of p << 0.001 (p=1.13e-46), which means that it is highly significant.
The correlation between these two parameters reached its peak in 1967 with a correlation of R=0.94 (p=3.4e-53), after which (as previosuly hypothesized) started rapidly decreasing, until it reached a low of R=0.72 (p=9.2e-22) in 2007. The datasets ends in 2007, but it can safely be assumed that the correlation has further drastically decreased in the past 14 years.
| CO2 emissions (metric tons per capita) | gdpPercap | |
|---|---|---|
| CO2 emissions (metric tons per capita) | 1.000000 | 0.926082 |
| gdpPercap | 0.926082 | 1.000000 |
Significant Correlation: In 1962 carbon emissions and GDP per capita had a correlation of: R=0.93 and a p value of 1.1e-46. Results were based on 108 observations Significant Correlation: In 1967 carbon emissions and GDP per capita had a correlation of: R=0.94 and a p value of 3.4e-53. Results were based on 113 observations Significant Correlation: In 1972 carbon emissions and GDP per capita had a correlation of: R=0.84 and a p value of 1.8e-32. Results were based on 116 observations Significant Correlation: In 1977 carbon emissions and GDP per capita had a correlation of: R=0.79 and a p value of 2.8e-26. Results were based on 116 observations Significant Correlation: In 1982 carbon emissions and GDP per capita had a correlation of: R=0.82 and a p value of 5.6e-29. Results were based on 116 observations Significant Correlation: In 1987 carbon emissions and GDP per capita had a correlation of: R=0.81 and a p value of 3.9e-28. Results were based on 116 observations Significant Correlation: In 1992 carbon emissions and GDP per capita had a correlation of: R=0.81 and a p value of 1.6e-29. Results were based on 122 observations Significant Correlation: In 1997 carbon emissions and GDP per capita had a correlation of: R=0.81 and a p value of 8e-30. Results were based on 124 observations Significant Correlation: In 2002 carbon emissions and GDP per capita had a correlation of: R=0.80 and a p value of 3.9e-29. Results were based on 125 observations Significant Correlation: In 2007 carbon emissions and GDP per capita had a correlation of: R=0.72 and a p value of 9.2e-22. Results were based on 128 observations
Significant Correlation: In 1967 carbon emissions and GDP per capita had a correlation of: R=0.94 and a p value of 3.4e-53. Results were based on 113 observations
Can a relationship between the continent and unergy use be identified for 1967?
To answer that question,once more the data is extracted from the dataset (this time from 1967).
As the relationship between a categorical (continent) and a continuous (energy use) parameter is being investigated and more than 2 groups are being compared, an ANOVA test would be most usefule. However, there are several assumptions that are made when conducting an ANOVA test, including:
In order to decide which statistical test (ANOVA vs Kruskal-Wallis) to use, it needs to be tested if theassumtions necessary for an ANOVA test are met.
The Levene's Test is used to verify that the variance between the groups is equal
| Energy use (kg of oil equivalent per capita) | ||||||||
|---|---|---|---|---|---|---|---|---|
| count | mean | std | min | 25% | 50% | 75% | max | |
| continent | ||||||||
| Africa | 199.0 | 698.516783 | 627.356473 | 9.715410 | 375.184208 | 449.521247 | 745.393302 | 3071.774832 |
| Americas | 188.0 | 1703.620453 | 2377.181918 | 219.075497 | 556.033108 | 749.029108 | 1384.585146 | 14608.009868 |
| Asia | 185.0 | 1867.280336 | 2590.043514 | 86.903767 | 345.370792 | 760.140852 | 1987.087308 | 12122.050603 |
| Europe | 256.0 | 3146.062066 | 1733.880414 | 350.101258 | 2073.999447 | 3027.931793 | 4034.557831 | 14746.031338 |
| Oceania | 20.0 | 3980.314420 | 1123.410756 | 1791.461322 | 3143.501420 | 4044.850674 | 4783.650230 | 5868.347097 |
Asia : variance 6708325 Europe : variance 3006341 Africa : variance 393576 Americas: variance 5650993 Oceania : variance 1262051
The p-value from the Levene's Test conducted on the data is: 8e-10. This shows that the variances between the data from the different groups differ significantly.
As at least one assumtion required to conduct an ANOVA test is not met, a Kruskal-Wallis test is conducted instead.
The p-value from the Kuska;-Wallis H-test conducted on the data is: 1e-67 with a Kruskal-Wallis H statistic of 318.67631745519657.
This shows that the Energy consumption varies significantly between continents.
A Tukey's test will bring more clarity about which groups differ from each other:
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=============================================================
group1 group2 meandiff p-adj lower upper reject
-------------------------------------------------------------
Africa Americas 1005.1037 0.001 466.8321 1543.3753 True
Africa Asia 1168.7636 0.001 628.2524 1709.2747 True
Africa Europe 2447.5453 0.001 1947.3833 2947.7072 True
Africa Oceania 3281.7976 0.001 2040.3398 4523.2555 True
Americas Asia 163.6599 0.9 -384.4165 711.7363 False
Americas Europe 1442.4416 0.001 934.1136 1950.7696 True
Americas Oceania 2276.694 0.001 1031.9237 3521.4642 True
Asia Europe 1278.7817 0.001 768.0828 1789.4806 True
Asia Oceania 2113.0341 0.001 867.2937 3358.7744 True
Europe Oceania 834.2524 0.3424 -394.5188 2063.0235 False
-------------------------------------------------------------
Based on the Tukey's test, it could be shown that the energy consumptions varies between most all continents, with 2 exceptions.
Energy consumtion did not differ between the Americas and Asia and did not differ between Europe and Oceania.
In order to answer this question, two subsets of data have to be created and compared to each other:
all data from Asian countries for the years 1990 - 2007 will make up the first subset
As there are only 2 groups to be compared to each other, a t-test will be used. I assumed that the data from the subsets is independent from each other.
| Country Name | Year | Agriculture, value added (% of GDP) | CO2 emissions (metric tons per capita) | Domestic credit provided by financial sector (% of GDP) | Electric power consumption (kWh per capita) | Energy use (kg of oil equivalent per capita) | Exports of goods and services (% of GDP) | Fertility rate, total (births per woman) | GDP growth (annual %) | Imports of goods and services (% of GDP) | Industry, value added (% of GDP) | Inflation, GDP deflator (annual %) | Life expectancy at birth, total (years) | Population density (people per sq. km of land area) | Services, etc., value added (% of GDP) | pop | continent | gdpPercap | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 6 | Afghanistan | 1992 | NaN | 0.101375 | NaN | NaN | NaN | NaN | 7.502 | NaN | NaN | NaN | NaN | 51.362927 | 21.054483 | NaN | 16317921.0 | Asia | 649.341395 |
| 7 | Afghanistan | 1997 | NaN | 0.060798 | NaN | NaN | NaN | NaN | 7.636 | NaN | NaN | NaN | NaN | 54.017829 | 27.623273 | NaN | 22227415.0 | Asia | 635.341351 |
| 8 | Afghanistan | 2002 | 38.471940 | 0.041129 | NaN | NaN | NaN | 32.386719 | 7.273 | NaN | 65.287704 | 23.714097 | NaN | 55.857195 | 32.912231 | 37.813963 | 25268405.0 | Asia | 726.734055 |
| 9 | Afghanistan | 2007 | 30.622854 | 0.087858 | 0.535181 | NaN | NaN | 17.823714 | 6.437 | 13.740205 | 58.350047 | 27.344703 | 22.382016 | 57.833829 | 39.637202 | 42.032443 | 31889923.0 | Asia | 974.580338 |
| 156 | Bahrain | 1992 | 0.909897 | 20.855156 | 25.401040 | 18509.874739 | 10844.755025 | 84.381992 | 3.466 | 6.689998 | 96.742045 | 37.211498 | -3.532652 | 72.914732 | 736.264789 | 61.878605 | 529491.0 | Asia | 19035.579170 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2552 | Vietnam | 2007 | 18.655100 | 1.245243 | 88.234838 | 738.471951 | 540.081953 | 70.517875 | 1.911 | 7.129504 | 84.087509 | 38.511625 | 9.630226 | 74.483854 | 271.611249 | 42.833275 | 85262356.0 | Asia | 2441.576404 |
| 2563 | West Bank and Gaza | 1992 | NaN | NaN | NaN | NaN | NaN | NaN | 6.608 | NaN | NaN | NaN | NaN | 68.717073 | 359.400498 | NaN | 2104779.0 | Asia | 6017.654756 |
| 2564 | West Bank and Gaza | 1997 | 13.096001 | 0.146327 | NaN | NaN | NaN | 17.296133 | 5.907 | 23.936312 | 73.041119 | 26.643483 | -3.833899 | 70.140390 | 449.587708 | 60.260517 | 2826046.0 | Asia | 7110.667619 |
| 2565 | West Bank and Gaza | 2002 | 8.769075 | 0.375598 | 4.869953 | NaN | NaN | 13.437201 | 5.105 | -1.417431 | 62.815682 | 21.910406 | 1.487861 | 71.069976 | 510.859302 | 69.320520 | 3389578.0 | Asia | 4515.487575 |
| 2566 | West Bank and Gaza | 2007 | 7.441636 | 0.665297 | 6.261647 | NaN | NaN | 19.366849 | 4.627 | -1.727004 | 77.810672 | 23.234322 | 5.198904 | 71.747049 | 580.481063 | 69.324043 | 4018332.0 | Asia | 3025.349798 |
104 rows × 19 columns
Ttest_indResult(statistic=-1.418525688795887, pvalue=0.157519693255542)
Based on the results of the t-test, the was no significant difference between Europe and Asia in terms of imports of goods and services measured as a % of GDP. Both continents have countries which are strong exporters and strong importers, whose exports seem to balance each other, so that the net imports between the two continents are comparable.
Comparing individual countries, would make the results differ to a much larger extent.
There is one country that is clearly an outlier: Singapore.
In order to answer this question, the dataset needs to be divided into one subset per year. In a second step, the country with the highest value in the column "Population density (people per sq. km of land area)" needs to be identified.
The results are as follows:
In 1962, the country with the highest population density is: Monaco In 1967, the country with the highest population density is: Monaco In 1972, the country with the highest population density is: Macao SAR, China In 1977, the country with the highest population density is: Monaco In 1982, the country with the highest population density is: Monaco In 1987, the country with the highest population density is: Macao SAR, China In 1992, the country with the highest population density is: Macao SAR, China In 1997, the country with the highest population density is: Macao SAR, China In 2002, the country with the highest population density is: Macao SAR, China In 2007, the country with the highest population density is: Monaco
Monaco (1962, 11967, 1977, 1982 and 2007) and the autonomous Chonese region Macau (1972, 1987, 1992, 1997 and 2002) take turn as the country with the highest population density. Both locations are micro states/city states, which increases their population density, as they have the density of most countries' cities, without having any large rural areas.
To answer this question, the increase from 1962 to 2007 has to be calculated for every country. Then, in a second step, these calculated values need to be compared to each other and the maximum value needs to be identified.
The Maldives (+37y), Bhutan (+33y) and Timor-Leste (+31y) had the highest increases in life expectancy in the time frame form 1962 - 2007.
All three of these countries had unproportionally low life expectancies, which allowed for extreme increases of life expectancy in the following years.